Statistical Identification of Collocations in Large Corpora for Information Retrieval

نویسنده

Benjamin Lambert

چکیده

The linguistic phenomenon of collocation, the habitual juxtaposition of some words in natural language has been shown to benefit natural language processing tasks such as information retrieval. This paper examines the utility of several methods for collocation extraction for document retrieval, specifically for queries in question form.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus

Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Beng...

متن کامل

Retrieving Collocations from Text: Xtract

Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to ...

متن کامل

INFO256 Project Report Implementation and Evaluation of Xtract in WordSeer

Natural languages are full of word collocations that frequently co-occur and correspond to arbitrary word usages. They appear in both technical and non-technical textual corpora and often have specific significance in individual contexts. Accurately retrieving and identifying collocations from a given corpus in an unsupervised manner is imperative to understanding and automatically generating t...

متن کامل

Domain Collocation Identification

In this paper we present a new method of automatic collocation identification. Collocation is an important relation between words, which is widely used, among others, in information retrieval tasks. Over the last years, many methods of automatic collocation acquisition from text corpora have been proposed. The approach described in this paper differs from the others by focusing on domain colloc...

متن کامل

Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation

The work presented provides computational linguistics methods and tools for collocation identiication from arbitrary text, and methods and tools for representing collocations in a relational database integrating competence (collocation-type-speciic linguistic analysis) and performance information (corpus sentences). The work diiers from existing approaches to collo-cation identiication in syste...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Statistical Identification of Collocations in Large Corpora for Information Retrieval

نویسنده

چکیده

منابع مشابه

Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus

Retrieving Collocations from Text: Xtract

INFO256 Project Report Implementation and Evaluation of Xtract in WordSeer

Domain Collocation Identification

Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation

عنوان ژورنال:

اشتراک گذاری